Add robust eigh_v2 problem by msaroufim · Pull Request #163 · gpu-mode/reference-kernels

msaroufim · 2026-06-30T22:28:22Z

Summary

Adds a separate eigh_v2 linalg problem and leaderboard entry without changing the existing eigh implementation or rankings.
Carries forward the important robustness ideas from the related eigh PRs: Improve eigh accuracy and benchmark balance #156 benchmark/accuracy balancing by @msaroufim, and eigh_py: reject physically-impossible benchmark times (roofline floor) #159/eigh_py: regenerate a fresh input each timed benchmark iteration #160/eigh_py: reject output-object deferral in the correctness check #161 roofline floor, fresh timed inputs, and output-deferral rejection by @robobryce.
Does not carry the old profiler-capture cherry-pick from Wrap timed custom_kernel launches in a cudaProfiler capture range (qr_v2, eigh_py) #157 because profile mode is already present on main via Add Eigh profile mode #158.
Adds block-diagonal correctness cases with varied block sizes, plus additional 512-shape benchmark distributions so precision choices need to respond to matrix quality rather than only public shape IDs.
Simplifies the checker to the non-redundant hard gates: eigen-equation residual, eigenvalue error against torch.linalg.eigvalsh(A), and orthogonality. Reconstruction is skipped because it follows for square orthonormal Q when the eigen-equation holds.
Keeps ranked runtime practical by making leaderboard mode use the same rechecked benchmark path instead of the previous 1000-repeat ranked loop.

Validation

python3 -m py_compile problems/linalg/eigh_v2/eval.py problems/linalg/eigh_v2/reference.py problems/linalg/eigh_v2/task.py problems/linalg/eigh_v2/submissions/torch_eigh.py problems/linalg/eigh_v2/submissions/triton_diagonal_fast_path.py
/Users/mark/Dev/kernelbot/.venv/bin/ruff check problems/linalg/eigh_v2
git diff --check
Local-only KernelBot debug setup against kernelbot_eigh_v2_debug on 127.0.0.1, with the local checkout registered through PROBLEM_DEV_DIR / PROBLEMS_REPO.
Baseline torch_eigh.py local submissions on B200:
- test: 41/41 pass
- benchmark: 10/10 pass
- real leaderboard submission after repeat-budget fix: pass, about 116s end-to-end locally; recorded phase durations included test at 7-10s, benchmark at 34-44s, and leaderboard at 44-48s.
Adversarial local submissions:
- Tensor subclass/output deferral failed in the evaluator with Q must be a plain torch.Tensor.
- Cache/replay and harness timing patch attempts were rejected by KernelGuard on the normal local API.
- With KernelGuard disabled only on a separate local debug API, cache/replay still failed evaluator correctness, and forged CUDA-event timing failed the new physical roofline floor.

Provenance

Resolved problem directory: problems/linalg/eigh_v2. Ranked/profile shapes come from eigh_v2/task.yml benchmarks:. Profile mode wraps the submitted kernel in the upstream custom_kernel NVTX region. Reference-kernels base used for this PR: origin/main at 4a1153e, with this branch at 9bcefc4.

Add a separate eigh_v2 leaderboard that keeps the existing eigh problem untouched while carrying the stricter checker and benchmark-integrity hardening from the open eigh follow-ups. The v2 evaluator regenerates inputs for scored benchmark iterations, rejects physically impossible reported times, and keeps profile mode from the current upstream evaluator. The v2 checker requires plain tensor outputs and adds an explicit eigenvalue comparison against torch.linalg.eigvalsh(A). The ranked set is trimmed to ten cases and repeats the central 512x512 shape across dense, mixed, rank-deficient, clustered, and row-scaled distributions so shape-only precision routing is less useful than inspecting matrix quality. Credit: this consolidates ideas and fixes from #156, #159, #160, and #161. Co-Authored-By: Bryce Adelstein Lelbach <brycelelbach@gmail.com>

msaroufim force-pushed the qr-v2-conditioning-hardening branch from 40ca746 to b208d15 Compare June 30, 2026 22:31

msaroufim force-pushed the qr-v2-conditioning-hardening branch from 133bded to 9bcefc4 Compare July 5, 2026 03:34

msaroufim changed the title ~~Document QR v2 conditioning hardening~~ Add robust eigh_v2 problem Jul 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add robust eigh_v2 problem#163

Add robust eigh_v2 problem#163
msaroufim wants to merge 1 commit into
mainfrom
qr-v2-conditioning-hardening

msaroufim commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

msaroufim commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Provenance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

msaroufim commented Jun 30, 2026 •

edited

Loading